Using multiple linguistic features for Mandarin phrase break prediction in maximum-entropy classification framework
نویسندگان
چکیده
We model Mandarin phrase break prediction as a classification problem with three level prosodic structures and apply conditional maximum entropy classification to this problem. We acquire multiple levels of linguistic knowledge from an annotated corpus to become well-integrated features for maximum entropy framework. Five kinds of features were used to represent various linguistic constraints including POS tag features, lexical features, phonetic features, length features, and distance features. Experiment results show that our method performs better than the previous methods and the conditional maximum entropy (ME) model is very effective for data sparseness problem in Mandarin phrase break prediction.
منابع مشابه
Incorporating second-order information into two-step major phrase break prediction for Korean
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...
متن کاملTODO: This is a placeholder. Final title will be filled later
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...
متن کاملPhrase break prediction using logistic generalized linear model
In this paper we propose a novel phrase break prediction model for Mandarin speech synthesis. It is generalized linear models (GLM) with stepwise regression solution. We assume phrase break obeys Bernoulli distribution and then model phrase break probability by Logistic GLM. The attribute set is automatically selected by stepwise regression, which is a totally data-driven method. We also introd...
متن کاملChinese prosody phrase break prediction based on maximum entropy model
A maximum entropy based model for prosody phrase break prediction was proposed in this paper, and a comparison was conducted on large corpora between the new model and the decision tree based model which was the mainstream method for prosody phrase break prediction. The contribution of lexical information and influences of different cutoff values were also investigated. It was demonstrated that...
متن کاملPhrase Break Prediction Using a Finite State Transducer
This paper presents a method for phrase break prediction using a finite state transducer. In the literature, several algorithms have been proposed using statistical techniques for predicting phrase breaks. Some of these methods rely on linguistic information, such as syllables, words, part-of-speech, accents, etc. Our proposal is a probabilistic finite state transducer to convert part-ofspeech ...
متن کامل